New C Compiler Features 


1. Long and short integers 


In the interests of portability, C integers now come in three 
flavours: “long', “short', and just plain “int'. A short integer 
is 16 bits long on both the Interdata X/32 and X/16, as well as 
on the PDP-11; a long integer is 32 bits long on any machine; an 
ordinary int is the "natural" size for the host machine: 16 bits 
on Interdata X/16 and PDP-11, 32 bits on Interdata X/32. 


The new type keywords can act rather like adjectives in that 


“short int' means a 16-bit integer and “long float' means the 
same as “double'. Or they may stand alone, implying integer 
types. 

Long constants are written with a terminating `l' or “L'. E.g. 
"123L" or “O177777777L" or “OX56789abcdL". The latter is a hex 
constant, which could also have been short; it is marked by 
starting with "OX". Every fixed decimal constant larger than 
32767 is taken to be long, and so are octal or hex constants 
larger than 0177777 (OXffff, or OXFFFF if you like). A warning 
is given in such a case since this is actually an incompatibility 
with the older compiler. Where the constant is just used as an 
initializer or assigned to something it doesn't matter. If it is 


passed to a subroutine then the routine will not get what it ex- 
pected. 


When a short and a long integer are operands of an arithmetic 
operator, the short is converted to long (with sign extension). 
This is true also when a short is assigned to a long. When a 
long is assigned to a short integer it is truncated at the high 
order end with no notice of possible loss of significant digits. 
This is true as well when a long is added to a 16-bit pointer 
(which includes its usage as a subscript). The conversion rules 
for expressions involving doubles and floats mixed with longs are 
the same as those for short integers, mutatis mutandis. 


A point to note is that constant expressions involving longs are 
not evaluated at compile time, and may not be used where con- 
stants are expected. Thus 
long x {5000L*5000L}; 
is illegal; 
long x {5000*5000}; 
is legal but wrong because the high-order part is lost; but both 
long x 25000000L; 
and 
long x 25.e6; 
are correct and have the same meaning because the double constant 
is converted to long at compile time. 


2. Unsigned integers 
A new fundamental data type with keyword ‘unsigned,' is avail- 
able. It may be used alone: 


unsigned u; 
or as an adjective with “int' 

unsigned int u; 
with the same meaning. There are not yet (or possibly ever) un— 
signed longs, shorts or chars. The meaning of an unsigned vari- 
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able is that of an integer modulo 2^n, where n is wordsize of the 
machine. All operators whose operands are unsigned produce 
results consistent with this interpretation except division and 
remainder where the divisor is larger than 2*%(n-1)-1; then the 
result is incorrect. The dividend in an unsigned division may 
however have any value (i.e. up to 2%n-1) with correct results. 
Right shifts of unsigned quantities are guaranteed to be logical 
shifts. 


When an ordinary integer and an unsigned integer are combined 
then the ordinary integer is mapped into an integer mod 2%n and 


the result is unsigned. Thus, for example “u = -1' results in 
assigning 65535 (on a 16-bit machine) or 4294967295 (on a 32-bit 
machine) to u. This is mathematically reasonable, and also hap- 


pens to involve no run-time overhead. 


When an unsigned integer is assigned to a plain integer, an (un- 
diagnosed) overflow occurs when the unsigned integer exceeds 
2° {n= Lele 


It is intended that unsigned integers be used in contexts where 
previously character pointers were used (artificially and non- 
portably) to represent unsigned integers. 


3. Block structure. 


A sequence of declarations may now appear at the beginning of any 
compound statement in {}. The variables declared thereby are lo- 
cal to the compound statement. Any declarations of the same name 
existing before the block was entered are pushed down for the 
duration of the block. Just as in functions, as before, auto 
variables disappear and lose their values when the block is left; 
static variables retain their values. Also according to the same 
rules as for the declarations previously allowed at the start of 
functions, if no storage class is mentioned in a declaration the 
default is automatic. 


Implementation of inner-block declarations is such that there is 
no run-time cost associated with using them. 


4. Initialization (part 1) 


This compiler properly handles initialization of structures so 
the construction 


struct { char name[8]; char type; float val; } x 
t "abe", "at, D23.4 he 


compiles correctly. In particular it is recognized that the 
string is supposed to fill an 8-character array, the `a' goes 
into a character, and that the 123.4 must be rounded and placed 
in a single-precision cell. Structures of arrays, arrays of 
structures, and the like all work; a more formal description of 
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what is done follows. 


<initializer> ::= <element> 
<element> ::= <expression> | <element> , <element> | 
{ <element> } | { <element> , } 


An element is an expression or a comma-separated sequence of ele- 


ments possibly enclosed in braces. In a brace-enclosed sequence, 
a comma is optional after the last element. This very ambiguous 
definition is parsed as described below. "Expression" must of 


course be a constant expression within the previous meaning of 
the Act. 


An initializer for a non-structured scalar is an element with ex- 
actly one expression in it. 


An "aggregate" is a structure or an array. If the initializer 
for an aggregate begins with a left brace, then the succeeding 
comma-separated sequence of elements initialize the members of 
the aggregate. It is erroneous for the number of members in the 
sequence to exceed the number of elements in the aggregate. If 
the sequence has too few members the aggregate is padded. 


If the initializer for an aggregate does not begin with a left 
brace, then the members of the aggregate are initialized with 
successive elements from the succeeding comma-separated sequence. 
If the sequence terminates before the aggregate is filled the ag- 
gregate is padded. 


The "top level" initializer is the object which initializes an 
external object itself, as opposed to one of its members. The 
top level initializer for an aggregate must begin with a left 
brace. 


If the top-level object being initialized is an array and if its 
size is omitted in the declaration, e.g. “int a[]", then the size 
is calculated from the number of elements which initialized it. 


Short of complete assimilation of this description, there are two 
simple approaches to the initialization of complicated objects. 
First, observe that it is always legal to initialize any object 
with a comma-separated sequence of expressions. The members of 
every structure and array are stored in a specified order, so the 
expressions which initialize these members may if desired be laid 
out in a row to successively, and recursively, initialize the 
members. 


Alternatively, the sequences of expressions which initialize är- 
rays or structures may uniformly be enclosed in braces. 


DY Initialization (part 2) 
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Declarations, whether external, at the head of functions, or in 
inner blocks may have initializations whose syntax is the same as 
previous external declarations with initializations. The only 
restrictions are that automatic structures and arrays may not be 
initialized (they can't be assigned either); nor, for the moment 
at least, may external variables when declared inside a function. 


The declarations and initializations should be thought of as oc- 
curring in lexical order so that forward references in initiali- 
zations are unlikely to work. E.g., 


{ int a a; 
ine bs ie? 
into G-t 


} 


Here a is initialized by itself (and its value is thus unde- 
fined); b is initialized with the old value of c (which is either 
undefined or any c declared in an outer block). 


6. Bit fields 
A declarator inside a structure may have the form 
<declarator> : <constant> 


which specifies that the object declared is stored in a field the 
number of bits in which is specified by the constant. If several 
such things are stacked up next to each other then the compiler 
allocates the fields from right to left, going to the next word 
when the new field will not fit. The declarator may also have 
the form 


<constant> 


which allocates an unnamed field to simplify accurate modelling 
of things like hardware formats where there are unused fields. 
Finally, 


0 
means to force the next field to start on a word boundary. 


The types of bit fields can be only "int" or "char". The only 
difference between the two is in the alignment and length res- 
trictions: no int field can be longer than 16 bits, nor any char 
longer than 8 bits. Tf a char field will not fit into the 
current character, then it is moved up to the next character 
boundary. 


Both int and char fields are taken to be unsigned (non-negative) 
integers. 
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Bit—-field variables are not quite full-class citizens. Although 
most operators can be applied to them, including assignment 
operators, they do not have addresses {i.e. there are no bit 
pointers) so the unary & operator cannot be applied to them. For 


essentially this reason there are no arrays of bit field vari- 
ables. 


There are three twoes in the implementation: addition {=+) ap- 
plied to fields can result in an overflow into the next field; it 
is not possible to initialize bit fields. 


7. Macro preprocessor 


The proprocessor handles ‘define' statements with formal argu- 
ments. The line 


#define macro(al,...,an) ...al...an... 


is recognized by the presence of a left parenthesis following the 
defined name. When the form 


macro(bl,...,bn) 

is recognized in normal C program text, it is replaced by the de- 
finition, with the corresponding bi actual argument string sub- 
stituted for the corresponding ai formal arguments. Both actual 
and formal arguments are separated by commas not included in 
parentheses; the formal arguments have the syntax of names. 

Macro expansions are no longer surrounded by spaces. Lines in 
which a replacement has taken place are rescanned until no macros 
remain. 


The preprocessor has a rudimentary conditional facility. A line 
of the form 


#ifdef name 
is ignored if `name' is defined to the preprocessor (i.e. was the 
subject of a ‘define' line). If name is not defined then all 
lines through a line of the form 

#endif 
are ignored. A corresponding form is 

#ifndef name 

fendif 
which ignores the intervening lines unless ‘name' is defined. 


Under unix the name “unix' is predefined and replaced by itself 
to aid writers of C programs which are expected to be transported 
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to other machines with C compilers. 
In connection with this, there is a new option to the cc command: 
cc —Dname 


which causes `name' to be defined to the preprocessor {and re- 
placed by itself). This can be used together with conditional 
preprocessor statements to select variant versions of a program 
at compile time. 


The previous two facilities (macros with arguments, conditional 
compilation) were actually available in the 6th Edition system, 
but undocumented. New in this release of the cc command is the 
ability to nest “include' files. Preprocessor include lines may 
have the new form 


#include <file> 


where the angle brackets replace double quotes. In this case, 
the file name is prepended with a standard prefix, namely 
“/usr/include'. In is intended that commonly-used include files 
be placed in this directory; the convention reduces the depen- 
dence on system-specific naming conventions. The standard prefix 
can be replaced by the cc command option `-I':; 


cc —Iotherdirectory 
8. Registers 
A formal argument may be given the storage class ~register.' When 
this occurs the save sequence copies it from the place the caller 
left it into a fast register; all usual restrictions on its use 
are the same as for ordinary register variables. 
Now any variable inside a function may be declared ‘register;' if 
the type is unsuitable, or if there are more than three register 
declarations, then the compiler makes it `auto' instead. The 
restriction that the & operator may not be applied to a register 
remains. 
9. Mode declarations 
A declaration of the form 

typedef type-specifier declarator ; 
makes the name given in the declarator into the equivalent of a 
keyword specifying the type which the name would have in an ordi- 


nary declaration. Thus 


typedef int *iptr; 
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makes “iptr' usable in declarations of pointers to integers; sub- 
sequently the declarations 


iptr ip; 

int. *ip; 
would mean the same thing. Type names introduced in this way 
obey the same scope rules as ordinary variables. The facility is 


new, experimental, and probably buggy. 
10. Top-level static 


The storage class `static' can be specified in top-level declara- 
tions. Names declared thereby are global to the rest of the file 


in which they appear (except as modified by block structure, see 
3 above) but are unconnected with names in programs in other 
files. This may be useful to systems of library routines which 


want to keep their internal interfaces hidden. 
11. Restrictions 


The compiler is somewhat stickier about some constructions that 
used to be accepted. 


One difference is that external declarations made inside func- 
tions are remembered to the end of the file, that is even past 
the end of the function. The most frequent problem that this 
causes is that implicit declaration of a function as an integer 
in one routine, and subsequent explicit declaration of it as 
another type, is not allowed. This turned out to affect several 
source programs distributed with the system. 


It is now required that all forward references to labels inside a 
function be the subject of a ‘goto.' This has turned out to af- 
fect mainly people who pass a label to the routine ‘setexit.' In 
fact a routine is supposed to be passed here, and why a label 
worked I do not know. 


In general this compiler makes it more difficult to use label 
variables. Think of this as a contribution to structured pro- 
gramming. 


The compiler now checks multiple declarations of the same name 
more carefully for consistency. It used to be possible to de- 
clare the same name to be a pointer to different structures; this 
is caught. So too are declarations of the same array as having 
different sizes. The exception is that array declarations with 
empty brackets may be used in conjunction with a declaration with 
a specified size. Thus 


int all; int a[50]; 


is acceptable (in either order). 
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An external array all of whose definitions involve empty brackets 
is diagnosed as “undefined' by the loader; it used to be taken as 
having 1 element. 
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